16 research outputs found

    Selecting Contextual Peripheral Information for Answer Presentation: The Need for Pragmatic Models

    Get PDF
    This paper explores the possibility of pre-senting additional contextual information as a method of answer presentation Question An-swering. In particular the paper discusses the result of employing Bag of Words (BoW) and Bag of Concepts (BoC) models to retrieve contextual information from a Linked Data resource, DBpedia. DBpedia provides struc-tured information on wide variety of entities in the form of triples. We utilize the QALD question sets consisting of a 100 instances in the training set and another 100 in the testing set. The questions are categorized into single entity and multiple entity questions based on the number of entities mentioned in the ques-tion. The results show that both BoW (syn-tactic models) and BoC (semantic models) are not capable enough to select contextual infor-mation for answer presentation. The results further reveals that pragmatic aspects, in par-ticular, pragmatic intent and pragmatic infer-ence play a crucial role in contextual informa-tion selection in the answer presentation.

    RealText-asg: A Model to Present Answers Utilizing the Linguistic Structure of Source Question

    Get PDF
    Recent trends in Question Answering (QA) have led to numerous studies focusing on pre-senting answers in a form which closely re-sembles a human generated answer. These studies have used a range of techniques which use the structure of knowledge, generic lin-guistic structures and template based ap-proaches to construct answers as close as pos-sible to a human generate answer, referred to as human competitive answers. This paper re-ports the results of an empirical study which uses the linguistic structure of the source ques-tion as the basis for a human competitive answer. We propose a typed dependency based approach to generate an answer sen-tence where linguistic structure of the ques-tion is transformed and realized into a sen-tence containing the answer. We employ the factoid questions from QALD-2 training ques-tion set to extract typed dependency patterns based on the root of the parse tree. Using iden-tified patterns we generate a rule set which is used to generate a natural language sentence containing the answer extracted from a knowl-edge source, realized into a linguistically cor-rect sentence. The evaluation of the approach is performed using QALD-2 testing factoid questions sets with a 78.84 % accuracy. The top-10 patterns extracted from training dataset were able to cover 69.19 % of test questions.

    Lexicalizing DBpedia with Realization Enabled Ensemble Architecture: RealText lex2 Approach

    Get PDF
    Abstract. DBpedia encodes massive amounts of open domain knowledge and is growing by accumulating more triples at the same rate as Wikipedia. However, the applications often require natural language formulations of these triples to present the information as a natural text. The RealText lex2 framework offers a scalable platform to transform these triples to natural language sentences using lexicalization patterns. The framework has evolved from its previous version (RealText lex ) and is comprised of four lexicalization pattern mining modules which derive patterns from a training triple collection. These patterns can be then applied on the new triples given that they satisfy a defined set of constraints

    RealText Lexicalization demonstration

    No full text
    <p>This is a video demonstration of RealText Lexicalization framework.</p

    Gender Classification - Dataset

    No full text
    <p>Gender Classification dataset</p

    A framework for generating informative answers for Question Answering systems

    No full text
    Recent trends in Question Answering (QA) systems have led to a proliferation of studies which have focused on building advanced QA systems which are able to compete with the QA ability of humans. To this effect, a large number of these systems have shifted to Question Answering over Linked Data (QALD). The use of Linked Data as the basis of knowledge representation by the QA systems has led to noticeable improvements in both recall and precision compared to the conventional, unstructured text based systems. However, answers from these systems are still not able to mimic human generated answers, which has been an ambition for Artificial Intelligence (AI) researchers for more than a decade. One of the two main reasons for the "machine feel" of the answers has been the inability of QA systems to present the answer as a fully constructed, natural language sentence. The second reason is that humans generally answer a question with elaboration containing additional contextual information, apart from the specific answer to the question. This aspect has been especially challenging for QA systems as it is difficult to source the contextual information, rank them and then formulate the information as multiple sentences in a form that emanates human generated text. Previous research has investigated answer presentation by summarizing unstructured text, selecting contextual information from a closed domain ontology, and using cooperative and user tailored answers, however, these studies have not dealt with the generation of an answer in natural language with additional contextual information. This thesis describes a framework, RealText, which presents an answer from a QA system in a natural language form, together with extraneous contextual information. This answer, referred hereafter as informative answer, comprises a sentence which presents the answer as a natural language sentence, and in addition, contains an elaboration of the entities contained in both the question and the answer. The information required to generate the elaborations was retrieved from DBpedia, which is an open domain Linked Data resource and is considered to be the nucleus for the ever-growing Linked Data cloud. Linked Data is represented in a structured form as a triple, and this enables the required information to be selected for the identified entities with no ambiguity compared to the use of unstructured text summarization which is prone to a high level of ambiguity. With the current rate of growth, Linked Data is set to become much more prevalent which will mean development of a lot more of Linked Data resources getting linked to DBpedia, thus making it a central hub for the Linked Data cloud. This would put architectures that use DBpedia as the knowledge source at an advantage. The generation of an elaboration paragraph based on structured information contained in the triples requires several steps. The triples firstly need to be lexicalized, which involves transformation of the individual triples into basic stand-alone sentences. These sentences then need to be further enhanced and meshed into a paragraph using linguistic tasks such as aggregation and referral expression generation. The RealText framework integrates these linguistic processes as used by humans to generate a paragraph consisting of multiple sentences. Additionally, the framework implements realization functions and inferences on gender, ontology class of an entity, inter alia, to further enhance the text to make it more akin to human generated text. We used the QALD evaluation campaign dataset which contains the question, the query, and the answer as the source data. Since we were working in the final answer presentation stage, extraction of the answer was out of the scope for this project. Additionally, the framework uses all of the triples associated with a given entity, hence does not focus on ranking triples. The list of triples used in the contextual information generation is provided by DBpedia which is the structured version of Wikipedia. The evaluation of research of this nature is challenging for two reasons; firstly there is no benchmark data available and secondly evaluation of natural text can only be done accurately by human evaluators which is expensive, both in terms of money and time. We evaluated the RealText framework based on three criteria; readability, accuracy, and informativeness. Measurement of these criteria is highly subjective and difficult to measure as definite scientific variables. It is far more challenging to implement automated systems to measure these criteria. To validate this research, we principally used human participants to evaluate the "naturalness" of the generated text under a condition in which the inter-annotator agreement was computed to make sure that there was a minimum threshold of agreement between the participants. In addition, we also investigated several automated metrics to see if anyone of them could correlate with the human evaluations. The results showed that more than 95% of the generated answers achieved an average rating above three out of five for all of the criteria. Furthermore, 39.02% of the generated answers achieved an average rating above four for the readability criteria, while the value for accuracy and informativeness were in the vicinity of 66%. Further, the investigation into the automated metrics showed that none of the metrics correlated with the human evaluations. In summary, this thesis presents a framework that would be able to generate multi sentence "natural text" based on a given set of entities by extracting the information from a linked data knowledge base such as DBpedia. The framework being presented is robust enough to be able to generate text for any given set of entities, hence would be extendable to any natural language generation task, such as description text generation for kiosks, dialogue systems for Intelligent Personal Assistants (IPA), patient summary generation in eHealth, and narrative generation in eLearning applications
    corecore